MoDEL: an efficient strategy for ungapped local multiple alignment
نویسندگان
چکیده
We introduce a method for ungapped local multiple alignment (ULMA) in a given set of amino acid or nucleotide sequences. This method explores two search spaces using a linked optimization strategy. The first search space M consists of all possible words of a given length W, defined on the residue alphabet. An evolutionary algorithm searches this space globally. The second search space P consists of all possible ULMAs in the sequence set, each ULMA being represented by a position vector defining exactly one subsequence of length W per sequence. This search space is sampled with hill-climbing processes. The search of both spaces are coupled by projecting high scoring results from the global evolutionary search of M onto P. The hill-climbing processes then refine the optimization by local search, using the relative entropy between the ULMA and background residue frequencies as an objective function. We demonstrate some advantages of our strategy by analyzing difficult natural amino acid sequences and artificial datasets. A web interface is available at
منابع مشابه
Efficient Algorithms for Locating the Length-Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis
We study two fundamental problems concerning the search for interesting regions in sequences: (i) given a sequence of real numbers of length n and an upper bound U , find a consecutive subsequence of length at most U with the maximum sum and (ii) given a sequence of real numbers of length n and a lower bound L, find a consecutive subsequence of length at least L with the maximum average. We pre...
متن کاملNew features of the Blocks Database servers
Blocks are ungapped multiple sequence alignments representing conserved protein regions, and the Blocks Database consists of blocks from documented protein families. World Wide Web (http://www. blocks.fhcrc.org) and Email ([email protected]) servers provide tools for homology searching and for analyzing protein family relationships. New enhancements include a multiple alignment processor ...
متن کاملAn Application of the ABS LX Algorithm to Multiple Sequence Alignment
We present an application of ABS algorithms for multiple sequence alignment (MSA). The Markov decision process (MDP) based model leads to a linear programming problem (LPP), whose solution is linked to a suggested alignment. The important features of our work include the facility of alignment of multiple sequences simultaneously and no limit for the length of the sequences. Our goal here is to ...
متن کاملA Bayesian Insertion/Deletion Algorithm for Distant Protein Motif Searching via Entropy Filtering
Bayesian models have been developed that nd ungapped motifs in multiple protein sequences. In this article, we extend the model to allow for deletions and insertions in motifs. Direct generalization of the ungapped algorithm, based on Gibbs sampling, proved unsuccessful because the con guration space became much larger. To alleviate the convergence dif culty, a two-stage procedure is introd...
متن کاملSOAP: short oligonucleotide alignment program
SUMMARY We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, sm...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computational biology and chemistry
دوره 28 2 شماره
صفحات -
تاریخ انتشار 2004